Sepsis subphenotyping based on organ dysfunction trajectory

Background Sepsis is a heterogeneous syndrome, and the identification of clinical subphenotypes is essential. Although organ dysfunction is a defining element of sepsis, subphenotypes of differential trajectory are not well studied. We sought to identify distinct Sequential Organ Failure Assessment (SOFA) score trajectory-based subphenotypes in sepsis. Methods We created 72-h SOFA score trajectories in patients with sepsis from four diverse intensive care unit (ICU) cohorts. We then used dynamic time warping (DTW) to compute heterogeneous SOFA trajectory similarities and hierarchical agglomerative clustering (HAC) to identify trajectory-based subphenotypes. Patient characteristics were compared between subphenotypes and a random forest model was developed to predict subphenotype membership at 6 and 24 h after being admitted to the ICU. The model was tested on three validation cohorts. Sensitivity analyses were performed with alternative clustering methodologies. Results A total of 4678, 3665, 12,282, and 4804 unique sepsis patients were included in development and three validation cohorts, respectively. Four subphenotypes were identified in the development cohort: Rapidly Worsening (n = 612, 13.1%), Delayed Worsening (n = 960, 20.5%), Rapidly Improving (n = 1932, 41.3%), and Delayed Improving (n = 1174, 25.1%). Baseline characteristics, including the pattern of organ dysfunction, varied between subphenotypes. Rapidly Worsening was defined by a higher comorbidity burden, acidosis, and visceral organ dysfunction. Rapidly Improving was defined by vasopressor use without acidosis. Outcomes differed across the subphenotypes, Rapidly Worsening had the highest in-hospital mortality (28.3%, P-value < 0.001), despite a lower SOFA (mean: 4.5) at ICU admission compared to Rapidly Improving (mortality:5.5%, mean SOFA: 5.5). An overall prediction accuracy of 0.78 (95% CI, [0.77, 0.8]) was obtained at 6 h after ICU admission, which increased to 0.87 (95% CI, [0.86, 0.88]) at 24 h. Similar subphenotypes were replicated in three validation cohorts. The majority of patients with sepsis have an improving phenotype with a lower mortality risk; however, they make up over 20% of all deaths due to their larger numbers. Conclusions Four novel, clinically-defined, trajectory-based sepsis subphenotypes were identified and validated. Identifying trajectory-based subphenotypes has immediate implications for the powering and predictive enrichment of clinical trials. Understanding the pathophysiology of these differential trajectories may reveal unanticipated therapeutic targets and identify more precise populations and endpoints for clinical trials. Supplementary Information The online version contains supplementary material available at 10.1186/s13054-022-04071-4.


Supplementary Materials
. Patient characteristics comparisons between survivors and nonsurvivors in the development cohort Table S2. Patient characteristics comparisons between survivors and nonsurvivors in the NMEDW validation cohort Table S3. Patient characteristics comparisons between survivors and nonsurvivors in the eICU validation cohort Table S4. Patient characteristics comparisons between survivors and nonsurvivors in the CEDAR validation cohort Table S5. Multiple clustering metrics statistics and cluster number determination Table S6. Patient physiological characteristics comparisons among subphenotypes in the development cohort Table S7. Patient basic characteristics comparisons among subphenotypes in the NMEDW validation cohort Table S8. Patient physiological characteristics comparisons among subphenotypes in the NMEDW validation cohort Table S9. Patient basic characteristics comparisons among subphenotypes in the eICU validation cohort Table S10. Patient physiological characteristics comparisons among subphenotypes in the eICU validation cohort Table S11. Patient basic characteristics comparisons among subphenotypes in the CEDAR validation cohort Table S12. Patient physiological characteristics comparisons among subphenotypes in the CEDAR validation cohort Table S13. Patient physiological characteristics comparisons among subphenotypes in terms of sensitivity analysis (GBTM) in the development cohort Table S14. Missing data (no., %) in cohorts for subphenotyping and predicting variables. Table S15. International classification of diseases, ninth revision, clinical modification codes used to identify infectious syndromes. Table S16. The associations between comorbidities and subphenotypes in the MIMIC-III development cohort Table S17. The associations between comorbidities and subphenotypes in the NMEDW validation cohort Table S18. The associations between comorbidities and subphenotypes in the eICU validation cohort Table S19. The associations between comorbidities and subphenotypes in the CEDAR validation cohort                   (ANCOVA) were used for the between-subphenotype comparisons, which were based on general linear model.
For each subphenotype, if a clinical variable median value (e.g., AST) is greater than the median of the development cohort, a ribbon is connected between the subphenotype and the group (e.g., Hepatic markers). Subphenotypes are demonstrated in different colors. If there are more clinical variables abnormal for that subphenotype, the ribbon would be more broader.

Appendix 6 The determination of the number of optimal clusters
Multiple clustering indexes were used to determine the optimal numbers of subphenotypes. We used 25 clustering index from Nbclust [2] to choose the number of clusters. The optimal number of clusters for each clustering index was shown in Table S5. Among all indices in development cohort: • 6 indices proposed 2 as the best number of clusters; • 3 indices proposed 3 as the best number of clusters; • 11 indices proposed 4 as the best number of clusters; • 3 indices proposed 5 as the best number of clusters; • 2 indices proposed 6 as the best number of clusters.
Among all indices in NMEDW validation cohort: • 9 indices proposed 2 as the best number of clusters; • 3 indices proposed 3 as the best number of clusters; • 10 indices proposed 4 as the best number of clusters; • 1 indice proposed 5 as the best number of clusters; • 2 indices proposed 6 as the best number of clusters.
Among all indices in eICU validation cohort: • 3 indices proposed 2 as the best number of clusters; • 4 indices proposed 3 as the best number of clusters; • 12 indices proposed 4 as the best number of clusters; • 3 indices proposed 5 as the best number of clusters; • 3 indices proposed 6 as the best number of clusters.
Among all indices in CEDAR validation cohort: • 0 indices proposed 2 as the best number of clusters; • 7 indices proposed 3 as the best number of clusters; • 14 indices proposed 4 as the best number of clusters; • 2 indices proposed 5 as the best number of clusters; • 2 indices proposed 6 as the best number of clusters.
According to the majority rule, the best number of clusters was set as 4 in experiment.

Appendix 7 Dynamic Time Warping (DTW) and Hierarchical Agglomerative Clustering (HAC)
Dynamic time warping (DTW) is one of most popular techniques to measure the similarity between two temporal sequences [3]. In DTW, the warping technique of one-to-many match is used to obtain the optimal match of two sequences so that the troughs and peaks with the same pattern are completely matched, and there is no left out for both temporal sequences. Once the matched sequences are obtained, the Euclidean Distance is used to compute the distance of two sequences. Due to its invariance against warping in the time axis, DTW has been widely utilized as a similarity measure in various domains such as speech recognition and provides more meaningful discrepancy measurements between two temporal sequences than other distance measures [4]. Previous clinical study in terms of COVID-19 has shown the DTW can be used to assess longitudinal changes in organ dysfunction and help clinicians or doctors to obtain important insights about pathophysiology [8].
Technically, DTW is a technique that compares sequences A and B through matching, where for every point in A it finds a best matching point in B. The distance between A and B is the sum of the distances between all matched point pairs. Mathematically, let ! = { " ! , # ! , . . . , $ ! ! } be the sequence of the i-th patient (1≤ ≤N). ! is the sequence length and % ! is its value at the k-th timestamp. In particular, given a sequence pair ( ! , & ), (see Figure  S19) for every data point % according to some ground distance metric ( % ! , %' & ), then the distance of the two sequence, . The matching process needs to satisfy 1) if = 1, then ′ = 1, which means the first points of the two sequences should match each other; 2) if = ! , then ′ = & , which means the last points of the two sequences should match each other; 3) the matching should be monotonic, i.e., if pairs and " < # , then ′ " ≤ ′ # . DTW matches ( ! , & ) by minimizing the sum of differences (measured by ground distance) between the all matched point pairs on the two sequences through dynamic programming [14]. After collecting all matched point pairs from ! and & , we get the warping path the number of entries in ( ! , & ) is its length. We can index the entries in ( ! , & ) according to the orders they appear in the path as In this study, after obtaining the SOFA trajectories of patients (see the Figure S20, the illustration of SOFA trajectory), each patient is represented as a vector of 12 SOFA scores from the first 6 hours to the last 6 hours across the 72 hours period after admission in ICU. DTW was used to evaluate the similarities between pairwise patient SOFA trajectories. Hierarchical Agglomerative Clustering (HAC) [13] is used to obtain subphenotypes based on the similarities of SOAF score trajectories derived by DTW. Compared to other clustering methods such as k-means, HAC has two advantages: (1) HAC is usually robust as it is not sensitive to data distribution and not necessary to have initialization procedure, which may cause uncertainty.
(2) HAC typically generates a tree diagram that can visually demonstrate how the data points are agglomerated together in a hierarchical manner and provides a visible way to assistance the determination of optimal cluster number. In order to furtherly determine the number of optimal cluster number, one famous R package "NbClust" [2] was used in this study. During using the "NbClust", we considered 25 indices. The more details in terms of these indices were shown in Table S5.

Appendix 8 Group-Based Trajectory Modeling (GBTM)
In order to assess the sensitivity to different clustering methods, in this study, we used GBTM to re-derive subphenotypes. The GBTM is one of the most popular latent class analysis (LCA) that assigns each patient a probability of being in each subphenotype by considering maximum likelihood estimation [5]. The GBTM has been widely used for researchers to identify underlying subgroups in terms of disease courses such as coronary artery risk development in young adults [6] [7]. During using GBTM to derive subphenotypes, the optimal subphenotype number was determined by comprehensively taking Akaike information criterion (AIC) [11] and Bayesian information criterion (BIC) [12] into account.